The contact network of patients in a regional healthcare system 
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Yet in spite of advances in hospital treatment, hospitals continue to be a breeding ground for several airborne 
diseases and for diseases that are transmitted through close contacts like SARS, methicillin-resistant Staphylo- 
coccus aureus (MRSA), norovirus infections and tuberculosis (TB). Here we extract contact networks for up to 
295,108 inpatients for durations up to two years from a database used for administrating a local public healthcare 
system serving a population of 1.9 million individuals. Structural and dynamical properties of the network of 
importance for the transmission of contagious diseases are then analyzed by methods from network epidemiol- 
ogy. The contact networks are found to be very much determined by an extreme (age independent) variation in 
duration of hospital stays and the hospital structure. We find that that the structure of contacts between in-patients 
exhibit structural properties, such as a high level of transitivity 1 1 ), assortativity 1 2) and variation in number of 
contacts 1 3), that are likely to be of importance for the transmission of less contagious diseases. If these proper- 
ties are considered when designing prevention programs the risk for and the effect of epidemic outbreaks may be 
decreased. 



I. LIMITATIONS OF TRADITIONAL EPIDEMIOLOGY 

A central parameter within infection epidemiology is the 
basic reproduction number Rq (3). Rq is used to estimate 
whether a disease is contagious enough to generate an epi- 
demic in a specific population. In its simplest form, is defined 
as the expected number of individuals that an infected indi- 
vidual will infect in a completely susceptible population. If 
all individuals in a population have approximately the same 
number of contacts, and the probability that any pair of indi- 
viduals will meet is equal, Ro can be estimated by the function 
below: 

Ro = c/3D, (1) 

where c stands for umber of contacts per time unit, ft or likeli- 
hood of passing on an infection per contact, and D for the av- 
erage time an individual is infectious (measured in same time 
unit as c). To make an epidemic possible, the infected person 
must infect more than one person on average. The threshold 
value for epidemics is therefore Rq = 1 . 

Studies have shown that Ro in its simplest form may yield 
misleading results. Anderson and May have demonstrated that 
a great variation in number of contacts may compensate for 
a low average number of contacts (3). This is because indi- 
viduals with many contacts have a far greater probability of 
becoming infected, and of passing on an infection. Therefore, 
in populations with great variation in number of contacts, Rq 
should be calculated as: 

tf = c/?Z)Jl + ^J (2) 

where <x 2 denotes the variance in the number of contacts. 
Another reason that Rq can be an oversimplification is that 



most contact networks studied are known to differ signifi- 
cantly from random interaction. 



II. NETWORK STRUCTURE AND DISEASE DYNAMICS 

Many contact networks are characterized by a high level 
of transitivity; that is, the number of triangles in the network 
is much larger than is found in an average network having 
the same frequency distribution of number of contacts (Ql). A 
large clustering coefficient tends to slow down epidemics be- 
cause the probability that an infected person's contacts will 
already be infected is very high in such a network (6s; 0). A 
common way to estimate clustering in a network is to estimate 
its relative number of triangles, or more exactly, to calculate 
the fraction C of all paths of length three in the network which 
form a triangle: 

3ntriangle 



where «triangie is the number of triangles (fully connected sub- 
graphs of three vertices) and n tr i p i e is the number of triples of 
vertices connected by two or three contacts. The factor three 
is needed to normalize C to the interval [0, 1]. 

Another difference between contact and random networks 
is that most contact networks are assortative by number of 
contacts (2). This means that individuals who have many con- 
tacts tend to have contact with other individuals who also have 
many contacts, and vice versa. The number of contacts is usu- 
ally referred to as "degree" in network theory, and we will 
use this term here. High assortativity decreases the epidemic 
threshold value among individuals who have many contacts. 
The standard measure is the assortative mixing coefficient r; 
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that is, the symmetric correlation coefficient between the indi- 
viduals' degrees on each side of all contacts: 

2(k\ + k\) - (h + k 2 ) 2 

where k; is the degree of the z'th argument in a list of the con- 
tacts (0). 

III. CONTACT NETWORKS OF PATIENTS 

We will now construct networks generated from a unique 
database consisting of 295,108 individuals who were regis- 
tered as "in-patients" at any hospital in Stockholm county 
(pop. 1.8 million) during 2001 and 2002. 

We want a contact to represent closeness in space and time. 
Our spatial requirement is that two patients should be on the 
same ward. For the temporal aspect we let closeness be a net- 
work parameter, and we regard a contact between patients as 
established if they were hospitalized on the same ward for a 
duration t \ (overlap time) or longer. A f i = means that con- 
tacts between one patient who was admitted the same day as 
another patient was discharged are included. Furthermore, we 
let the sampling time window size At be another parameter. 
The two parameters t \ and At yield different networks that 
are relevant to different diseases. For example, for diseases 
such as measles, SARS, and norovirus, which spread rapidly, 
a narrow time window will be appropriate (8); for diseases re- 
quiring prolonged contact for transmission, like tuberculosis, 
the relevant network is represented by a larger f i. 

IV. NETWORK STRUCTURAL PROPERTIES 

A. Degree distribution 

We will start by plotting the probability density function 
P(k) for an individual to have k contacts. This function is 
plotted for, respectively: 

• One weekday in January. 

• The entire month of January. 

• The first six months of 2001. 

• The whole period 2001-2002. 

In Figure[3\ we see the development of this contact struc- 
ture from an exponential distribution to a degree distribution 
with a truncated "fat tail." It is clear that variation in the num- 
ber of contacts between individuals increases with time. Fig- 
ure^} shows the degree to which calculated using Equation^ 
must be compensated according to Equation[2]for this increase 
in variation. The skewed degree distribution in our case is 
related to the power-law-like distribution of hospitalization 
times (see Figure|5]and Sect.Q}}. The distribution of hospital- 
ization will indirectly lead to "preferential attachment" HUol) 
(that is, a heightened probability of high-degree vertices to 
form additional contacts) — a well known mechanism for pro- 
ducing fat-tailed degree distributions 
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FIG. 1 The degree distribution and its effect on the reproduction 
number. In the upper panel we see the probability density function 
P{k) versus k (with logarithmic binning) for networks with overlap 
t ol = and different time windows. The lower panel shows the cor- 
rection to the basic reproduction number Rq as a function of the time 
window size for t \ = 0. For example, for At = 2 years an epidemic 
can occur by a disease five times less transmissible than predicted by 
traditional models. 
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FIG. 2 The probability that any hospitalization lasts At or longer. 
In A, all data are used; in B, the patients over 60 years are removed 
from the data set. 



B. Transitivity and assortativity 

We will now investigate how transitivity, C, and assortativ- 
ity, r, vary with sampling time, At, and overlap time, t D \. 

In Figure|5jA) we display the clustering coefficient, C, and 
in Figure |5jB) the level of assortative mixing, r, as functions 
of our time intervals t \ and At. We note that both parameters 
exhibit a very similar functional form. C and r both decrease 
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FIG. 3 The clustering (A) and assortative mixing (B) coefficients, C and r, as functions of sampling time, Af, and overlap time, f i. 



with A? when is held constant for small f i, butincrease when 
tol is held constant for large values of t \. Both parameters 
behave in a similar way when Af is held constant. For small 
values of f„i, the parameters first increase and then start to de- 
crease as a function of t a \. For large values of t \, both param- 
eters first increase with t Q \ until they converge at a high level 
of clustering and assortativity. 

The estimated high levels for the C and the r parameters, 
and the resemblance in functional form between the r-surface 
and the C-surface in Figure are a consequence of the com- 
partmental structure of the healthcare system. Assume a hy- 
pothetical network, O in which all inpatients stay on the same 
ward during the entire duration of Af. In such a network, 
each inpatient will have a link to each other inpatient on the 
same ward. The level of clustering between the inpatients 
will therefore equal 1, both locally on each ward and glob- 
ally throughout the whole healthcare system, because no links 
exist between inpatients on different wards in O. It is trivial 
that the level of assortativity can only be defined if there is 
a variation in ward size, and that r in these cases must equal 
1 because the degree of contacts on each side of each link 
will be equal. Our results show that networks defined by a 
large value of t \ and a large value of Af come very close to Q. 
Both C and r are large, and the vast majority of individuals in 
these networks are registered as inpatients only once per ward 
(see supplementary material). If we relax the restrictions on 
Q such that each inpatient stays on the same ward during the 
entire period of Af, the C and r parameters may drop below 
1 . This makes it possible for triples, which not are triangles, 
to be formed between inpatients that stay on different wards, 
and between inpatients that stay on the same wards but at dif- 
ferent times. This also makes it possible for links to form be- 
tween nodes with different degrees. The same occurs where 
Af increases when f D i is held constant for low levels of t \. The 
decay in C and r is a result of the skew distribution of hospital- 
ization times (Fig. |2J — a long-term patient A will form a triple 
(but not a triangle) with the many pairs of short-term patients 
whose hospitalization does not overlap with each other's but 
does overlap with A. This situation, and the number of inpa- 
tients who stay on more than one ward, will be more common 
for larger time windows, causing C, and consequently r, to 



decay over time. 

That C increases with f D i (for fixed Af) may seem counter- 
intuitive: As f i grows, the network will have fewer edges, 
and also fewer triangles. We have constructed a simple agent- 
based model that shows that a prerequisite for this is the ob- 
served skewed distribution of hospital stays (see Sect. P. 



V. SUMMARY AND CONCLUSIONS 

Our study of a very large inpatient database shows that hos- 
pital systems characteristically have a very large variation in 
duration of hospital stays, which generates a correspondingly 
large variation in number of contacts. We have further shown 
that the clustering coefficient and assortative mixing depend 
greatly on sampling time and the length of time that two in- 
patients must spend together for contact to be effected. Both 
of these coefficients, C and r, become extremely large in our 
real-life network when f i and Af are large. This is alarming 
because it has been shown that both a high level of clustering 
and a high level of assortative interaction decrease the epi- 
demic threshold (|lj 0). Any strategy to intervene with dis- 
ease spread in a hospital environment must take into account 
this departure from assumptions of random interaction and ho- 
mogenous mixing. For infections with high transmissibility, 
short incubation times and short duration of infectiousness, 
such as norovirus infections and SARS, our finding may be 
less important. However, for diseases such as tuberculosis or 
MRSA characterized by low transmissibility and long dura- 
tion of infectiousness, it becomes necessary to take this varia- 
tion into consideration because a high variance will lower the 
epidemic threshold. 

Our findings indicate that the individuals with many con- 
tacts are significant for the spread of infectious diseases with 
long duration of infectiousness. These high-risk individuals 
will probably be identifiable through hospital patient registra- 
tion systems, and should be the first to be targeted by contact 
tracing. The high level of clustering further indicates that it 
may be worth screening all inpatients that have spent time on 
the same ward as positive inpatients before and after the posi- 
tive inpatients were there. The high level of clustering makes 
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FIG. 4 Mean duration of hospital stay for the inpatients for each 
ward versus number of inpatients a year for the wards in the database. 

it reasonable to assume that more than one inpatient will be in- 
fectious at the same time on the same ward, and consequently 
that the disease would have existed among the inpatients on 
the ward both before and after the actual inpatient in question 
was on the ward. 
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APPENDIX A: Further statistics 

The dataset contains information for 570,382 ward admis- 
sions, including date of admission, date of discharge, and 
ward identity. There were a total of 702 wards located at 
52 different geographical units such as hospitals and nursing 
homes. The mean daily number of patients on the wards for 
the two-year period varied between 1 and 69 (mean 10.05, 
SD 9.44). Wards with a large number of inpatients per year 
strongly tend toward shorter duration of inpatient hospital 
stays, and vice versa, as shown in Figure^ 

As described in the Sect. |lll| we define a network as the 
individuals who have been inpatients some time during the 
sample time, Af, and a contact as a link between two individu- 
als who have been inpatients on the same ward for a duration 
> f i. Figure|5l\ and|5jj show the number of nodes, N, and the 
number of links, M, for nodes having at least one link with a 
duration > t \. 

The Af and M surfaces show a large variation in absolute 
size. The surface for the number of vertices per node shows 
a similar surface. The quote between the largest value and 
smallest value is, however, smaller by several orders of mag- 
nitude. 



The number of healthcare occasions and number of differ- 
ent wards visited during the period under study varies a widely 
for different values of, t \ (see Figure^). The number of sepa- 
rate healthcare occasions for all contacts, that is, to t i > 1, in 
particular exhibits a fat tail. This holds to some extent for the 
number of visited wards as well. The individuals who had at 
least one contact with a duration of at least 100 days are thus 
considerably less mobile between the wards in the hospital 
system than those who have not. 

The dataset is associated with one known systematic bias 
in the sense that one single inpatient may be registered as an 
inpatient on two wards at the same time such as when an inpa- 
tient is moved for a short period but is expected to return. Our 
analyses show that one single individual is registered on two 
separate wards 6734 times. We have not been able to show 
that these biases have any significant effect on the results we 
are presenting in this paper and will therefore use the whole 
dataset in our analyzes. 

APPENDIX B: Notes on the distribution of hospitalization 
times 

In Figure [8] we have plotted the cumulative distribution, of 
fdur, for all healthcare occasions in 2001. This allows us to 
plot the cumulative distribution in the interval 1 to 365 days 
for all of these healthcare occasions without interference from 
any finite size effects of the material in this interval. The plot 
shows that the duration of hospital stays exhibits a skewed 
power-law-like tail, p(?dur) ~ We estimate the slope, a, in 
the interval [f m ;„, 365] by fitting a in p(td m ) = f du r / T > where 

365 

T = J] ^ < B1 > 

i=tmm 

is a normalization factor. A maximum likelihood procedure 
was used for the estimation. The 95% confidence intervals 
were estimated by bootstrapping. Figure |SJ\ and |SJi show 
how a changes when fmjn is increased. 

APPENDIX C: A model of contact networks of patients 

To answer the question why C increases with t Q \ (for fixed 
Af) we construct a simple agent-based model of a healthcare 
system from first principles: Suppose a healthcare system of 
Nw wards of equal capacity is intended to serve a population 
of Af individuals. Each day a non-hospitalized individual hos- 
pitalized with a probability p\ and will stay for a random time 
f (of some distribution P,) on ward w (how the ward is chosen 
is discussed below). After hospitalization the patient is either 
transferred to another ward with probability p2 for a duration 
of a new t or discharged. This dynamic, given a Af and t a \, 
yields networks just like our real data did. 

How shall we assign patients to the wards? The simplest 
assumption is to choose the wards with equal probability. As 
seen in Figure|5] this yields the shape of C and r seen in Fig- 
ure|3l 



5 



x 10 s x10 ? 




FIG. 5 The number of vertices (A) and number of edges (B), N and M, as functions of the overlap time / | and time window size At. 
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FIG. 6 The average number of edges per vertex, N/M, as functions 
of the overlap time t ol and time window size At. 
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FIG. 7 The cumulative distribution, p(x > X), for the number of 
healthcare occasions per inpatient and the number of visited wards 
per inpatient during the period 2001-2002. 



One important feature is missing from this model: differ- 
ent specialty wards hospitalize patients for different durations. 
If we incorporate this, the curves stay qualitatively the same. 
From the model, we understand that for large overlap times 
the long-term patients form densely connected components — 
otherwise the network is empty. The model is insensitive to 
parameter values. A skewed P, function is, however, needed. 

The algorithm consists of the following steps repeated t 
times (that is, one of these steps corresponds to one day in 
the simulation): 
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FIG. 8 The best estimates of the slope for different values of for 
the whole population (A) and for individuals younger than 65 years 
old (B) The error bars are 95% confidence intervals generated by 
bootstrapping. 
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FIG. 9 he assortative mixing (A) and clustering coefficients (B) as 
functions of the overlap time t ol for a simulated healthcare system. 
For the "no diff." curves, patients are assigned to a random ward, 
whereas for "diff." curves, patients with a similar duration of hospital 
stay share wards (which reproduce the functional forms of Fig. 3). 
The simulation parameters are A' = 10000, N w = 50, p\ = 0.02 / day, 



p 2 = 1/3, At = 2500 days, and P, 
over 10 runs of the algorithm. 



The curves are averaged 



1. Go through all healthy (non-hospitalized) patients. 
With a probability p\ hospitalize a patient. The dura- 
tion of the hospitalization is given by P,. Assign a ward 
according to the methods listed below. In our simula- 
tion, we choose P, ~ f 3 . This is based on the observed 
distribution of hospitalization times (see Figure|2j. 
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2. Go through all newly discharged patients. With prob- 
ability pi re-hospitalize a patient. The duration of the 
hospitalization is given by P,. Assign a ward according 
to the methods listed below. 

3. If needed, construct the network according to the 
method detailed in the Sect. Mil 

To assign a ward given a hospitalization time, we propose 
two different methods. The first option is to select the ward 
by uniform randomness. Clearly, this will, on average, make 
all wards equally full. This method is used for the "no diff." 
curves in Figure [2] However, the hospitalization times are 
very different — on some wards, the hospitalization time is 
much longer than average; on others, people stay for very 
short periods. To model this, we differentiate strictly between 
the wards so that the patients on ward w, always stay a shorter 
time than the patients on ward Vfj+i. We implement this by 
generating A^ rn d random numbers distributed according to P t . 
Then we sort these values in increasing values of t and asso- 
ciate ward 1 with the f-values [t\, . . . , t s \], ward 2 with the t- 
values [f s i + 1 , . . . , t .52], and so on, such that the sum of f-values 
are the same for all wards. During the iterations, a random 
value of the array of random numbers is drawn and the pa- 
tient is assigned to the corresponding pre-assigned ward. We 
use N md = 10 6 for Figure The same plot with N md = 10 4 
yields indistinguishable curves. This is the method used for 
the "diff." curves in Figure [9] Finally, to obtain the curves 
in Figure^ we average the result of 20 runs of the algorithm 
above. 
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